TextPipe: Online Help
    Data Mining
 

Submit feedback on this topic 

 Home  User Assistance   Tutorials   How to Use TextPipe
 Menus: File   Edit   Filters[ Convert   Add   Remove   Unicode   Replace   Special   Map   Email   Restrict ]  Tools   Window   Help   Advanced
Home
Up

 

 

What is Data Mining?

Data mining or text mining is where a source of information is processed to extract information. Examples include:

  • Process a web site to extract product catalog and cost information. This can then be used to compare prices between different suppliers.
  • Process a web site to extract email addresses or web URLs.
  • Harvest the data on a web site for your own purposes.

Extracted data is designed to be easily loaded into a database for further analysis.

Click here for our white papers on Web Site Data Mining, Report Mining and more

How can TextPipe help?

TextPipe can be used to generate an extract from any text data source, including web sites. TextPipe can also be used to perform data cleansing or any additional processing e.g.

  • add a header record (e.g. provide column titles for .CSV files)
  • remove unwanted data
  • replace specific text
  • convert line feeds to DOS/Unix/Mac
  • expand tabs
  • fix capitalization
  • convert from EBCDIC to ASCII
  • remove multiple whitespace
  • remove columns, lines or fields
  • remove duplicate records
  • sort
  • extract email addresses from specific fields
  • discard records matching a pattern
  • and much more

Click here for more information about TextPipe

How can Offline Explorer help?

MetaProducts Offline Explorer (OE) can download your favorite Web, FTP and HTTPS sites (up to 100 simultaneously) for later offline viewing, editing or browsing (OE has a built-in internal browser). Offline Explorer allows you to selectively (include or exclude) individual servers, directories, and files using only keywords. OE is known for its excellent user interface and as one of the fastest known Web site downloaders available on the market today. Support of industry-standard technologies, like FTP, HTTPS, different proxy servers, Macromedia Flash, Cookies, XML, a built-in internal HTTP server that allows you to share downloaded files over an Intranet, makes Offline Explorer the leader in offline browsing.

For data mining applications, Offline Explorer can run TextPipe when a web site has completed downloading. OE knows which file types contain text, so it tells TextPipe to only process those files.

TextPipe can also be used to reduce the size of downloaded web sites:

  • convert line feeds from DOS/Unix/Mac to None
  • remove multiple whitespace
  • remove whitespace at the start and end of each line
  • remove blank lines
  • remove binary characters

Click here for more information about Offline Explorer

 

 Contact Us   Support   Community   Tutorials and User Guides (online)
 Copyright © 1999-2005 Crystal Software Australia. All rights reserved.